Role Mining with Probabilistic Models 



Mario Frank", Joachim M. Buhmann'', David Basin'' 
^ UC Berkeley, Computer Science Division 
''ETH Zurich, Department of Computer Science 

January 8, 2013 

Abstract 

Role mining tackles the problem of finding a role-based access control (RBAC) con- 
figuration, given an access-control matrix assigning users to access permissions as input. 
Most role mining approaches work by constructing a large set of candidate roles and use 
a greedy selection strategy to iteratively pick a small subset such that the differences 
between the resulting RBAC configuration and the access control matrix are minimized. 
In this paper, we advocate an alternative approach that recasts role mining as an infer- 
ence problem rather than a lossy compression problem. Instead of using combinatorial 
algorithms to minimize the number of roles needed to represent the access-control ma- 
trix, we derive probabilistic models to learn the RBAC configuration that most likely 
underlies the given matrix. 

Our models are generative in that they reflect the way that permissions are assigned 
to users in a given RBAC configuration. We additionally model how user-permission 
assignments that conflict with an RBAC conflguration emerge and we investigate the 
influence of constraints on role hierarchies and on the number of assignments. In experi- 
ments with access-control matrices from real- world enterprises, we compare our proposed 
models with other role mining methods. Our results show that our probabilistic models 
infer roles that generalize well to new system users for a wide variety of data, while other 
models' generalization abilities depend on the dataset given. 

1 Introduction 

Role-Based Access Control (RBAC) [T^] is a popular access control model. Rather than 
directly assigning users to permissions for using resources, for example via an access control 
matrix, in RBAC one introduces a set of roles. The roles are used to decompose a user- 
permission relation into two relations: a user-role relation that assigns users to roles and 
a role-permission relation that assigns roles to permissions. Since roles are (or should be) 
natural abstractions of functional roles within an enterprise, these two relations are concep- 
tually easier to work with than a direct assignment of users to permissions. Experience with 
RBAC indicates that this decomposition works well in practice and facilitates the adminis- 
tration of large-scale authorization policies for enterprises with many thousands of users and 
permissions. 

Although the benefits of using RBAC are widely recognized, its adoption and adminis- 
tration can be problematic in practice. Adoption requires that an enterprise migrates its 
authorizations to RBAC. Additionally, even after RBAC is in place, authorizations may need 
to be reassigned after major changes within the enterprise, for example, after a reorganization 
or a merger where processes and IT systems from different divisions must be consolidated. 
Migration and the need to reengineer roles after major changes are the most expensive aspects 
of RBAC. 

To address these problems, different approaches have been developed to configuring RBAC 
systems. These approaches have been classified into two kinds ;37]: top-down and bottom- 
up role engineering. Top-down engineering configures RBAC independently of any existing 



user-permission assignments by analyzing the enterprise's business processes and its secu- 
rity policies. Bottom-up role-engineering uses existing user-permission assignments to find a 
suitable RBAC configuration. 

When carried out manually, bottom-up role engineering is very difficult and therefore 
incurs high costs and poses security risks. To simplify this step, |24j proposed the first 
automatic method for bottom-up role engineering and coined the term role mining for such 
methods. Numerous role mining algorithms have been developed since then, notably [SJ 1351 

Unfortunately, most role mining algorithms suffer from the drawback that the discovered 
roles are artificial and unintuitive for the administrators who must assign users to roles. This 
undesirable effect is linked to the design principle for role mining algorithms that aim at 
achieving one of the following two goals: 

• minimizing the deviation between the RBAC configuration and the given access control 
matrix for a given number of roles, or 

• minimizing the number of roles for a given deviation (possibly zero). 

Both goals amount to data compression. As a result, the roles found are often synthetic sets 
of permissions. Although these roles minimize the RBAC configuration, they are difficult to 
interpret as functional roles, also called business roles, that reflect the business attributes 
of the users assigned to them. This shortcoming not only limits the acceptance of the new 
RBAC configuration among employees, it also makes the RBAC configuration difficult to 
maintain, for example, to update when users join the system or change their business role 
within the enterprise. Moreover, by optimizing the roles to fit existing authorizations as close 
as possible, existing erroneous user-permission assignments are likely to be migrated to the 
new RBAC system. Several attempts have been made to incorporate business attributes into 
the role mining process to improve the business-relevance of the roles [5S1[T5]. However, these 
algorithms compress the access control matrix as the underlying objective and they do not 
necessarily infer predictive roles. 

The problems outlined above arise because the predominant definitions of the role mining 
problem [SHI UHl [13 123 do not reflect realistic problem settings. In particular, all prior 
definitions assume the existence of input that is usually not available in practice. For example, 
the input includes either the number of roles to be found or the maximally tolerable residual 
error when fitting the RBAC configuration to the given access-control matrix. Guided by 
these definitions, the respective algorithms focus on compressing user-permission assignments 
rather than on structured predictions of roles. However, compression addresses the wrong 
problems. We argue that there is a need for a definition of role mining as an inference problem 
that is based on realistic assumptions. Consequently, given such a definition, algorithms 
must be developed that aim at solving this problem. Moreover it is necessary to have quality 
measures to compare how well different algorithms solve the problem. Our contributions 
cover all these aspects: 

• We provide the first complete approach to probabilistic role mining, including the prob- 
lem definition, models and algorithms, and quality measures. 

• We define role mining as an inference problem: learning the roles that most likely 
explain the given data. We carefully motivate and explicate the assumptions involved. 

• We propose probabilistic models for finding the RBAC configuration that has most 
likely generated a given access-control matrix. 

• We demonstrate that the RBAC configuration that best solves the role inference prob- 
lem contains roles that generalize well to a set of hold-out users. We therefore provide a 
generalization test to evaluate role mining algorithms and apply this test to our methods 
and to competing approaches. 

• We experimentally demonstrate that our probabilistic approach provides a sound con- 
fidence estimate for given user-permission assignments. 
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Figure 1: Graphical illustration of the decomposition of an access control matrix X, which 
assigns N users to D permissions, into an RBAC configuration with two assignments: Z 
assigns users to K roles and U assigns roles to permissions. The reconstruction Z * U fits X 
up to residuals. The Boolean matrices X, Z, and U encode the relations UPA, UA, and PA, 
respectively. 



• We develop a hybrid role mining algorithm that incorporates business attributes into 
the role mining process and thereby improves the interpretability of roles. 

We proceed as follows. First, in Section [2j we review the assumptions that are often 
implicitly made for role mining and we use them to define the inference role mining problem 
together with appropriate evaluation criteria. Then, in Section [3j we derive a class of proba- 
bilistic models from the deterministic user-permission assignment rule of RBAC. We explain 
two model instances in detail and we propose learning algorithms for them in Section [4] In 
Sections [5} we report on experimental findings on real world access control data. In Section [6] 
we show how to include business attributes of the users in the role mining process to obtain 
a hybrid role mining algorithm and we experimentally investigate how these attributes in- 
fluence the RBAC configurations discovered. We discuss related work in Section [7] and draw 
conclusions in Section [8] 

2 Problem definition 

In this section we define the role inference problem. First, we explain the assumptions 
underlying our definition. Afterwards we present a general definition of role mining that 
takes business-relevant information into account. The pure bottom-up problem without such 
information will be a special case of the general problem. Finally, we propose quality measures 
for assessing role mining algorithms. 

2.1 Assumptions and problem definition 

The role mining problem is defined in terms of a set of users USERS, a set of permissions 
PRMS, a user-permission assignment relation UPA, a set of roles ROLES, a user-role as- 
signment relation UA, a role-permission assignment relation PA, and, if available, top-down 
information TDI. Our problem definition is based on three assumptions about the relation- 
ships between these entities and the generation process of UPA. All these entities and their 
relationships are sketched in Figure [2j 

Assumption 1: An underlying RBAC configuration exists We assume that the 
given user-permission assignment UPA was induced by an unknown RBAC configuration 
RC*= [ROLES* UA* PA*), where "induced" means that UPA w UA**PA*. This assumption 
is at the heart of role mining. To search for roles implicitly assumes that they are there to 
be found. Said another way, searching for roles in direct assignments between users and 
permissions only makes sense if one assumes that parts of the data could, in principle, be 
organized in such a structured way. Without this assumption, the role structure that one can 
expect to find in a given access-control matrix would be random and therefore meaningless 
from the perspective of the enterprise's business processes and security policies. 
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Assumption 2: Top-down information TDI influences the RBAC configuration 

RC* We assume that RC* reflects the enterprise's security pohcies and the business pro- 
cesses in the sense that RC* encodes these policies and enables users to carry out their 
business tasks. Full knowledge of the security policies and business processes, as well as 
all business attributes and user tasks, should, in principle, determine the system's user- 
permission assignment. We denote all such information as top-down information (TDI); this 
name reflects that the task of configuring RBAC using TDI is usually referred to as top-down 
role mining. In practice, only parts of TDI may be available to the role mining algorithm. In 
Figure [2] we account for this structure by distinguishing "TDI" from "TDI input". Whenever 
some parts of TDI are used in the role mining process we then speak of hybrid-role mining. 



Assumption 3: Exceptions exist The observed user-permission assignment UPA might 
contain an unknown number of exceptions. An exception is a user-permission assignment that 
is not generated by the role-structure RC* but rather from a set of unknown perturbation 
processes. To capture this assumption in Figure [2] we distinguish the given access control 
matrix UPA « UA* * PA* from an unknown exception-free matrix UPA' = UA* * PA* that 
is fully determined by the role structure. We emphasize that, even though there are many 
ways errors can arise in a system, exceptions are not necessarily errors. Moreover, a role 
mining algorithm cannot be expected to discriminate between an error (that is, an unintended 
exception) and an intentionally made exception, if additional information is not provided. A 
role mining algorithm can identify exceptions and report them to a domain expert (ideally 
ranked by their likelihood). Such a procedure already provides a substantial advantage over 
manually checking all user-permission pairs as it involves far fewer checks. Due to the lack of 
additional information, we abstain from making further assumptions about the exceptions, 
for instance the fraction of exceptions S. Instead, determining such parameters will constitute 
an important part of the role mining problem. 

We now propose the following definition of the role mining problem. 
Definition 1. Role inference problem 

Let a set of users USERS, a set of permissions PRMS, a user-permission relation UPA, 
and, optionally, parts of the top-down information TDI be given. Under Assumptions 1-3, 
infer the unknown RBAC configuration RC*^ {ROLES* UA* PA*). 

This definition, together with the assumptions on UPA^s generation process, provides 
a unified view of bottom-up and hybrid role mining. The two cases only differ in terms 
of the availability of top-down information TDL In hybrid role mining, parts of the top- 
down information that influenced RC* is available. When TDI is not provided, the problem 
reduces to bottom-up role mining. Note that in such cases the goal still remains the same: the 
solution to Problem[T]solves the bottom-up problem as well as the hybrid role mining problem. 
Thereby, the assumption that RC* is (partially) influenced by TDI is also reasonable for the 
pure bottom-up role mining problem. Whether TDI actually influences RC* does not depend 
on the availability of such data. 



USERS 
PRMS 



Top-down Information (TDI) 



TDI input 







RC* = (ROLES*, UA*, PA*) 



UPA' 



^^^^ + perturbations 



Figure 2: Dependencies of the entities in- 
volved in role mining. This scheme illustrates 
how the user-permission assignment UPA is 
assumed to be generated. An arrow indicates 
the direction of the generation process. Grey 
entities are unknown and black ones are given 
as input. For pure bottom-up role mining, no 
top-down information is given. 



Relationship to role mining by compression The role mining problems that aim to 
achieve the closest fit for a given compression ratio or achieve the best compression for a 
given deviation differ from the role inference problem. Technically, our definition has less 



4 



input than the ahernatives. For the above-mentioned problems, either the number of roles 
k or the deviation S is provided as input. In contrast, our definition makes no assumptions 
on these quantities and both k and S must be learned from UPA (as finding RC* involves 
finding k and, at the same time, determines S = UPA — UA* * PA*). 

Moreover, we see it is an advantage that the assumptions of the problem are explicitly 
given. By making the assumption of an underlying role structure a condition for role mining, 
our problem definition favors conservative algorithms in the following sense. If little or no 
structure exists in UPA, the optimal algorithm should refrain from artificially creating too 
many roles from UPA. In contrast, if the number of roles or the closeness of fit is predeter- 
mined, optimal algorithms will migrate exceptional (and possibly unwarranted) permissions 
to RBAC. 

Finally, it is unrealistic in practice that 5 or k will be given as an input. Hence, treating 
them as unknowns reflects real- world scenarios better than previous definitions for role mining 
that require either (5 or fc as inputs. 

2.2 Quality measures 

Comparison with true roles The obvious quality measure that corresponds to the role 
inference problem is the distance to the hidden RBAC configuration RC* underlying the 
given data UPA. Several distance metrics for comparing two RBAC configurations exist, 
for instance the Hamming distance between the roles or the Jaccard similarity between the 
roles. Usually, however, RC* is not known in practice. A comparison is possible only for 
artificially created user-permission assignments where we know RC* or when an existing 
RBAC configuration is used to compute a user-permission assignment. We therefore focus in 
this paper on quality measures that are applicable to all access control matrices, independent 
of knowledge of RC* . 

Generalization error We propose to use generalization error for evaluating RBAC con- 
figurations. The generalization error is often used to assess supervised learning methods for 
prediction f5TI . The generalization error of an RBAC configuration RC that has been learned 
from an input dataset X*^^) indicates how well RC fits to a second datasct X^^^ that has been 
generated in the same way as X^^^. 

Computing the generalization error for an unsupervised learning problem like role mining 
is conceptually challenging. In general, it is unclear how to transfer the inferred roles to a 
hold-out test dataset when no labels are given that indicate a relationship between roles and 
users. We employ a method that can be used for a wide variety of unsupervised learning 
problems: the transfer costs proposed in |15| . The transfer costs of a role mining algorithm 
are computed as follows. First the input dataset is randomly split along the users into a 
training set X^^^ and a validation set X^^-'. Then the role mining algorithm learns the RBAC 
configuration RC — (Z, U) based only on the training set and without any knowledge of 
the second hold-out dataset. Having learned RC, the solution is transfered to the hold-out 
dataset by using a nearest neighbor mapping between the users of both datasets. Each user 
in X'^-* is assigned to the set of roles of its nearest neighbor user in X(i). Technically, this 
means we keep U fixed and create a new assignment matrix Z', where row i is copied from 
the row in Z that corresponds to the nearest neighbor user of user i. Then generalization 
error is the Hamming distance between Z' * U and X*-^^ divided by the number of entries of 
X'^' . This ratio denotes the fraction of erroneously generalized assignments. 

The rationale behind our measure is intuitive: Since the input dataset is assumed to be 
generated by an unknown RBAC configuration RC* , the subsets X^^^ and X*^^-' have also 
been generated by RC* . The structure of the input matrix is the same in different subsets of 
the dataset, but the random exceptional assignments are unique to each user. A role mining 
algorithm that can infer RC* from one subset, will have a low generalization error because the 
generating RBAC configuration should generalize best to the data that it has generated. As 
a consequence, a role mining algorithm that overfits to noise patterns in one dataset will fail 
to predict the structure of the second dataset. Thereby, even a perfect role mining algorithm 
will have a positive generalization error if the data is noisy. It is the relationship to the 
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generalization error of other algorithms that counts. All methods fail to predict exceptional 
assignments but the better methods will succeed in identifying the structure of the hold- 
out data while the inferior methods will compromise the underlying structure to adapt to 
exceptions. 

An advantage of computing the generalization error as described above is that this com- 
putation is agnostic to the role mining algorithm used. In particular it works for both 
probabilistic and combinatorial methods. For probabilistic methods one could achieve im- 
proved results by using a posterior inference step to assign test users to the roles discovered. 
However, this method is tailored to the particular methods employed and would not work for 
methods without a probabilistic model. 

3 From a deterministic rule to a class of probabilistic 
models 

In this section we propose a class of probabilistic models for role mining. We derive our core 
model from the deterministic assignment rule of RBAC and extend this core model to more 
sophisticated models. We will present two such extensions: (1) the disjoint-decomposition 
model (DDM) with a two-level role hierarchy where each user has only one role and (2) multi- 
assignment clustering (MAC), a flat RBAC model featuring a role relationship where users 
can assume multiple roles. We will also show why these two instances of the model class are 
particularly relevant for role mining. 

3.1 Core model 

In the following, we derive the core part of our probabilistic model. We start with the 
deterministic rule that assigns users to permissions based on a given role configuration. We 
then convert this rule into a probabilistic version, where one reasons about the probability 
of observing a particular user-permission assignment matrix given the probabilities of users 
having particular roles and roles entailing permissions. 

We denote the user-permission assignment matrix by X, with X G {0, 1}^^^. As short- 
hand, we write Xj^, for the z**^ row of the matrix and x^^; for the d'^ column. We define the 
generative process of an assignment Xid e {0, 1} by 

Ukd ^ P{ukd\f3kd) (1) 

Xid ~ p{x^d\u*d,2^*) , (2) 

where a ~ means that a is a random variable drawn from the probability distribution 
p{a). The latent variable Ukd G {0,1} determines permission d S {1,...,D} of source k G 
{1, K}. The parameter Zik & {0, 1} encodes whether user i is assigned to role k. As Ukd is 
binary, p{ukd\f3kd) is a Bernoulli distribution, with p{ukd = 0) := Pkd and 

p{ukd\Pkd) :=/3m'"='(1-/3m)"". (3) 

Throughout this section, we will condition all probabilities on Z. Therefore, we can ignore 
p{zik) for the moment and treat it as a model parameter here. In Section |3.3| we will treat 
Zik as a random variable and describe a particular prior distribution for it. 

The generative model that we described so far is illustrated in Figure |3] All entities in this 
figure have a well-defined semantic meaning. The circles are random variables. A filled circle 
denotes that the variable is observable (like the user-permission assignment Xid) and an empty 
circle represents a hidden variable. Small solid dots are unknown model parameters. Arrows 
indicate statistical dependencies between entities. Whenever an entity is in a rectangle (say 
in the iV-rectangle and in the D-rectangle) then multiple different realizations of this entity 
exist (here N ■ D realizations of Xid^ each with a different index i and d). 

The probability p(xic;|u*rf, z^*) is deterministic in the following sense. Given all role- 
permission assignments Ukd and the role assignments z^*, the bit xid is determined by the 
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Figure 3: Graphical model corre- 
sponding to the generation rule of 
user-permission assignments given 
an RBAC configuration. See the 
explanation of the semantics in the 
text. 



disjunction rule defined by the Boolean matrix product 

K 



X = Z*U such that Xid = f{u^,d,Zi^,) := \J {zik A Ukd) ■ (4) 

k=l 

To derive the likelihood of Xid, we express this deterministic formula Xid = /(u^dT^i*) 
in terms of a probability distribution. To this end, the entire probability mass must be 
concentrated at the deterministic outcome, i.e. the distribution must be of the form 

/I N J 1' if S^ici = /(u^ci, Zi*) , , 

p(a:.,|u,,,z„) = | ^^^^^^.^^ . (5) 

A probability distribution that fulfills this requirement is 

p{x,d\u,d,z^,) ^ ukdr^y' ''"\'^-Y[{i-ukdy''y^'. (6) 

fc k 

This can be seen by going through all eight combinations (for a single fc) of the binary 
values of z^fc, x^^, and Ukd- The distribution refiects that there are only two possible outcomes 
of this random experiment. In the following, we exploit this property and work just with 
probabilities for Xid — 0. The probability for Xid = 1 is always the remaining probability 
mass. 

The model in its current deterministic state is not directly useful for role mining given the 
hidden variables U]^d- We therefore eliminate the to obtain a likelihood that only depends 
on the model parameters and the observations. This can be achieved by marginalizing out 
Ufed, i-e., summing over all possible K x D matrices u. As derived in Appendix | A. 1[ this yields 
the likelihood p{xid = 0|/3^^, z^*) = Ylk Pkd ■ This term reflects that if a user is not assigned 
to a role, then the role does not have any influence on the user's permissions. Therefore, the 
chance of a user not being assigned some permission d' decreases with the number of roles of 
the user, since the chances /3kd' of d' not being assigned to the roles are multiplied. 

As Xid can only take two possible values, we have p{xid — Mf^^di z^*) = 1 — Yik Pk^d such 
that the full likelihood of the bit Xid is 

WP'kdA [^-WP^kdA ■ (7) 

According to this likelihood, the different entries in X are conditionally independent given 
the parameters Z and (3. Therefore, the complete data likelihood factorizes over users and 
permissions: p(X|/3, Z) = H^i DdLi Zj*)- 

3.2 Role hierarchies 

In this section we extend the core model by introducing role hierarchies. The core model 
provides a hierarchy of depth 1 as there is only one level of roles. We introduce an additional 
level, resulting in a hierarchy of depth 2. The meaning of the hierarchical relationship is 
as follows. Roles in the second layer can be sub-roles of roles in the first layer. The set of 
permissions for a super-role includes all permissions of its sub-roles. 
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Our derivation is generic in that it can be used to add extra layers to a hierarchical 
model. By repeated application, one can derive probabilistic models with hierarchies of any 
depth. As we will see, the one-level hierarchy (flat RBAC) and the two-level hierarchy are 
particularly interesting. In Section |3.3| we propose a model variant for flat RBAC and a 
model variant with a two-level hierarchy. 

Like the core model, the hierarchical model derived here is not restricted to role mining. 
However, we will motivate its usefulness of hierarchies by practical considerations in access- 
control. We assume that there exists a decomposition of the set of users into partially 
overlapping groups: Users are assigned to one or more groups by a Boolean assignment 
matrix Z. Each row i represents a user and the columns k represent user-groups. In practice, 
such a decomposition may be performed by an enterprise's Human Resources Department, 
for example, by assigning users to enterprise divisions according to defined similarities of 
the employees. If such data is lacking, then the decomposition may just be given by the 
differences in the assigned permissions for each user. For simplicity, the matrix Z has the 
same notation as in the last section. 

We now introduce a second layer. We assume that there is a decomposition of the permis- 
sions such that every permission belongs to one or more permission-groups. These member- 
ships are expressed by the Boolean assignment matrix Y. Here the ^th row of Y represents 
the permission-group I and the dth column is the permission d. The assignment of permissions 
to permission-groups can be motivated by the technical similarities of the resources that the 
permissions grant access to. For example, in an object-oriented setting, permissions might 
be grouped that execute methods in the same class. Alternatively, permissions could be cat- 
egorized based on the risk that is associated with granting someone a particular permission. 
Of course, permissions can also be grouped according to the users who own them. 

We denote user-groups by business roles whereas we call permission-groups technical 
roles. Business roles are assigned to technical roles. We represent these assignments by a 
matrix V. To keep track of all introduced variables, we list the types of the above-mentioned 
Boolean assignment matrices: 

• Users i to permissions d: G {0, 1}, where ie {1, ...,A^} , de {1, ...,13}. 

• Users i to business roles k: Zik G {0, 1}, where k G {1, ...,K}. 

• Business roles k to technical roles I: Vki G {0, 1}, where I G {1, ...,L}. 

• Technical roles I to permissions d: yid G {0, 1}. 

Throughout this section, the indices i, d, k, and I have the above scope and are used to index 
the above items. 

Starting with this additional layer of roles, one can recover a flat hierarchy by collapsing 
the role- role assignment matrix and the role-permission assignments using the disjunction 
rule Ukd = V/ ^ki A yid- Thereby, U can be understood as the role-permission assignment 
matrix from the last section. With this structure, the final N x D user-permission assignment 
matrix X is determined by two Boolean matrix products 



X==Z*U = Z*V*Y with 



=v 



ZikA\\/ Vki A yid 



(8) 



Equation |8] expresses when a user i is assigned to permission d. There exists one Boolean 
matrix product per role layer. Note that for a given RBAC configuration, we can also partially 
collapse hierarchies. In particular, this makes sense when a business role is directly linked to 
permissions. We are again interested in the probability of such an assignment. Starting from 
this logical expression, we derive below how likely it is to observe an assignment of a user i 
to a permission d. 

The deterministic assignment rule for two layers of roles is graphically illustrated in Fig- 
ureQa): a user is assigned to a permission if there is at least one path in the graph connecting 
them. As this figure indicates, a user can be assigned to a permission in multiple ways, that 
is, there may be multiple paths. It is therefore easier to express how a user may not be 
assigned to a permission (we denote this by :— x) rather than computing the union over 
all possible assignment paths. Also, we will abbreviate parameters by z+:=p(z = l) and, for 
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independent variables, -.— {z^, z^^^} 



K 

V 



^ifc A Y Ufci A yid 



\i=i 



As shown in Appendix |A.2[ this probability is 



P (5^ 14, y^., = n ( 1 - 4 + 4 n (1 - vtd) + 2/^(1 - 4)) 

We condition this expression on the binary entries of Y and Z. 



(9) 



(10) 



p(x^iz„,y.,,«+) = ni'""'=- 11(1- 4^)'" =n(i-4;)™ (11) 



fc,/ 



As this expression is independent of other matrix entries in X, we can express the complete 
likelihood of the user-permission assignment matrix given the business roles and technical 
roles as a product over users and permissions. 

p(X I Z, Y) = Jl [l-p{x~i I ■ii*,y*d,vti)T"' [p{^ I Zi*7y*<i,Ws!;)]^ 



n 



1 n(i 



4- '.ZikVld 



k.l 



n(i 



V 



kl) 



k.l 



(12) 



If we treat Vki as a random variable with probability p{vki — 0) — {1 — v'^^) , then this 
likelihood resembles the one with only one layer of roles. The only differences are the addi- 
tional binary variables yid in the exponent that, like Zj^., can switch off individual terms of 
the product. Thereby, we chose to condition on Y and Z and leave v^i random. We could 
as well have conditioned on V and Z and inferred yid- This alternative points to a generic 
inference strategy for role hierarchies of arbitrary size. One treats the assignment variables 
in one layer as a random variable and conditions on the current state of the others. We will 
demonstrate such an alternating inference scheme on a two level hierarchy in Section |3.3| 



3.3 Overparametrization and instantiation by introducing constraints 

The above model of user-permission assignments defines a very general framework. In princi- 
ple, one can iteratively introduce additional layers in the role hierarchy without substantially 
changing the outer form of the likelihood. In the derivations, we have avoided any prior 
assumptions on the probabilities of the entries of V, Y, and Z. We have only exploited the 
fact that these variables are Booleans and, therefore, only take the values or 1. We have 
also avoided any assumptions about the processes that lead to a particular decomposition of 
the set of users and the set of permissions. Moreover, we have not specified any constraints on 
the user decomposition, the permission decomposition, or the assignments from user-groups 
to permission-groups. 

It turns out that the model with a two-level hierarchy already has more degrees of free- 
dom than is required to represent the access control information present in many domains 
that arise in practice. In particular, when only the data X is given, there is no information 
available on how to decompose the second role level. This lack of identifiability becomes 
obvious when we think about a one-level decomposition with role-permission assignments U 
(as in Eq. Q) and try to convert it into a two- level decomposition. The Z, U decomposition 
has already sufficiently many degrees of freedom to fit any binary matrix. Further decom- 
posing U into an extra layer of roles V and assignments from these roles to permissions Y is 
arbitrary when there does not exist additional information or constraints. Therefore, the flat 
RBAC configuration with only one role layer is the most relevant one. The two-level hierar- 
chy without constraints is over-parameterized. Such a hierarchy can be seen as a template 
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Figure 4: Illustration of the structure of three model instances. A user has a permission if 
there is at least one path connecting them, a) Full model, b) Model with trivial decomposition 
of the permissions (flat RBAC). c) Disjoint Decomposition Model with only one business role 
per user and one technical role per permission. 



for an entire class of models. By introducing constraints, we can instantiate this template 
to specialized models that fit the requirements of particular RBAC environments and have 
a similar model complexity as flat RBAC without constraints. These instances of the model 
class are given by augmenting unconstraint two-level RBAC with assumptions on the proba- 
bility distributions of the binary variables and giving constraints on the variables themselves. 
In the following, we will present two relevant model instances and explain their relationship. 
Later, we will extend the models with generation processes for exceptional assignments. 



Flat RBAC 



In this model, each permission is restricted to be a member of only one permission-group and 
each permission-group can contain only a single permission. Formally: ^VZ : yu = 1^ A 
(Vj ■ J2i Vid = !)• The conditioned likelihood then becomes 



at(X|Z)=n 



\{p{vkdT 



(13) 



This "collapsed" model is equivalent to flat RBAC without constraints. This can be seen 
by renaming w by m and J by D. As each technical role serves as a proxy for exactly one 
permission, we have D = J anyway. A graphical representation of the structure of this model 
instance is given in Figure Qb). Equivalently, we could introduce one-to-one constraints on 
the user role assignment and collapse Z instead of Y, leading to a model with the same 
structure. 



Disjoint decomposition model (DDM) 

This model has even stronger constraints. Namely, k^^x — 1 and the number of assigned 
permission-groups per permission is limited to Imax = 1- This formalizes that each user 
belongs to exactly one user-group and each permission belongs to exactly one permission- 
group. Hence, both users and permissions are respectively partitioned into disjoint business 
roles and technical roles. A disjoint decomposition substantially reduces the complexity of 
a two-level hierarchy while still retaining a high degree of flexibility since users of a given 
user-group may still be assigned to multiple permission-groups. We illustrate this model in 
Figure Qc). 



3.4 Prior assumptions on probabilities 

A central question in role mining is how many roles are required to explain a given dataset. In 
this paper, we take two different approaches to determining the number of roles k. For the flat 
RBAC model, we treat as a fixed model parameter. One must therefore repeatedly run the 
algorithm optimizing this model for different k and select the result according to an external 
measure. In our experiments, we will tune the number of roles k by cross-validation using the 
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generalization error as the external quality measure. For DDM with a two-level hierarchy, 
we explicitly include prior assumptions on the number of roles into the model using a non- 
parametric Bayesian approach. This way, the role mining algorithm can internally select the 
number of roles. 

Instead of directly providing the number of roles for DDM, we only assume that, given an 
RBAC configuration with roles and user-role assignments, the a priori probability of a new 
user being assigned to one of the roles depends linearly on the number of users having this 
role (plus a nonzero probability of creating a new role). This assumption reflects that it is 
favorable to assign a new user to existing roles instead of creating new roles when users enter 
the enterprise. This rich-get-richer effect indirectly influences the number of roles as, under 
this assumption, an RBAC configuration with few large roles (large in the number of users) 
is more likely than a configuration with many small roles. 

Our assumption is modeled by a Dirichlet process prior [TJ[TT]. Let N be the total number 
of users and let Nk be the number of users that have role k (the cardinality of the role) . The 
Dirichlet process is parametrized by the nonnegative concentration parameter a. 



l|Zj^i/, a) 



N-1 + 



N-l- 



(14) 



Here, we used the short-hand notation Z^^^' for the role assignments of all users except (the 
hypothetically new) user i' . The event where user i' is assigned a role with N = Q corresponds 
to creating a new role with exactly the permissions of i' . 

The business-role to permission-role assignments are distributed according to a Bernoulli 
distribution p{vki \ Z, Y,/3) = /3^;~"'''(1 — PuY'''' ■ Again, the model parameter /3ki accounts 
for the probability that the assignment is not active. We introduce a symmetric Beta prior 
for /3fc/: 



2r(7) 



7-1 



(15) 



where B(., .) and r(.) are the beta function and the gamma function, respectively. We derive 
the update equations for a Gibbs sampler on this model in Appendix |A.3[ 

With the Dirichlet process prior on the user-role and the role-permission assignments and 
the Beta-Bernoulli likelihood, DDM is equivalent to the "infinite relational model" (IRM) 
proposed in [23]. We will use a similar Gibbs-sampling algorithm to infer the model param- 
eters of DDM as used in [23] . While [23] use hill climbing to infer the model parameters, we 
repeatedly sample the parameters from the respective probabilities and keep track of the most 
likely parameters. We graphically illustrate the model with all introduced prior assumptions 
in Figure [5} There, we use different hyperparameters ai and a2 for the assignments Z and 
Y. However, in all calculations we use the same hyperparameters a — ai — ai. 



• a, 



i=l:N 



ma. 




-V.L 



d=l:D 



Figure 5: Graphical model for DMM 
with prior distributions. Filled cir- 
cles are observed random variables 
and empty circles are hidden. Solid 
points denote fixed parameters of 
the model. Arrows indicate depen- 
dencies. Entities on a iV-rectangle 
exist in N different realizations. 



Nonparametric Bayesian priors could, in principle, also be added to the flat RBAC model 
proposed above. However, as this model allows users to have multiple roles, the Dirichlet 
process prior is not applicable to Z. Instead, the "Indian Buffet Process" [TO] could be used 
as a prior. In our experiments, we compare with an algorithm that combines such a prior 
with a noisy-OR likelihood [SS] ■ 
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3.5 Noise 



The goal of role mining is to infer the role structure underlying a given access control matrix. 
In this context, structure means permissions that are frequently assigned together and users 
that share the same set of permissions (not necessarily all their permissions) . User-permission 
assignments that do not replicate over the users do not account for the structure. We call 
such exceptional assignments noise. A noisy bit can be an unusual permission or a missing 
assignment of a permission. 

In this section we add an explicit noise process to the flat RBAC model. Let xfj be 
a "structural" bit and let be a "noise" bit. While a structural bit is generated by the 
generation process of the user-permission assignments in the flat RBAC scenario given by 



Eq. (13), a noise bit is generated by a random coin flip with probability 



Pn {■ 



(l-r)^"-' 



(16) 



Let be a binary variable indicating if the observed bit Xid is a noise bit or a structure 
bit. Then the observed bit is 



(17) 



With the structure and noise distribution combined, the resulting probability of an ob- 
served Xid is 

p{xu\C,,|3,r,^,d)^PN{x^d\rf'ps{x^d\C,,(3f'^^^ . (18) 
In Appendix | A.4| we marginalize out ^ and obtain the final likelihood of this model. 



PMAC (X I Z, /3, r, e) = J]^ (e • pn [xid | r) + (1 - e) • pflat {xid I Z, /3)) 



(19) 



The generation process underlying this model is depicted in Figure[6j The advantage of an ex- 
plicit noise process over threshold-based denoising methods [1] is the ability to automatically 
adapt to the data's noise level. Another way to deal with noise is denoising in the continuous 
domain as a preprocessing step for a generic role mining method. Such an approach using 
SVD has shown good performance |31) . While this step is computationally inexpensive, it 
requires selecting a cutoff threshold. 




Figure 6: The generative 
model of Boolean MAC 
with mixture noise. Ci 
is the assignment set of 
user z, indicating which 
roles from U generated the 
user's permissions. The 
bit S^id selects whether the 
noise-free bit xf^ or the 
noise bit a;^ is observed. 
This model is an extension 
of the model in Fig. [3] 



4 Learning the model parameters 

In the last section we derived a class of probabilistic models. To apply these models to the 
role mining problem, we require algorithms that learn the model parameters from a given 
access control matrix. In this section we present two such learning algorithms. In particular, 
we use an annealed version of expectation-maximization (EM) for the models with a fixed 
number of parameters and Gibbs sampling for the non-parametric model variants. 
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4.1 Annealed EM 



When applying the weU-known EM algorithm to clustering problems, one alternates between 
updating the expected cluster assignments given the current centroids (E-step) and updating 
the centroids given the current assignments (M-step). In the case of the proposed model for 
multi-assignment clustering (MAC) , the E-step computes the expected assignment ^c,i of a 
user i to role set C for each role and each user: 

. IM .-1 
7£,, =p(/:|X,/3,e,r)i/^^p(/:|X,/3,e,r)i/r . (20) 

Here, we have introduced the computational temperature T > 0. The case with T = 1 
reproduces the conventional E-step. The limit T ^ oo yields the uniform distribution over 
all role sets and a low temperature T makes the expectation of the assignments "crisp" 
(close to or 1). The normalization ensures that the sum over all £ S L equals 1. 

While iterating this modified E-step and the conventional M-step, we decrease the tem- 
perature starting from a value of the order of the negative log-likelihood costs. As a result, 
the local minima of the cost function are less apparent in the early stage of the optimization. 
In this way, lower minima can be identified than with conventional EM, although there is 
no guarantee of finding the global minimum. In addition to this robustness effect of the 
annealing scheme, we obtain the desired effect that, in the low-temperature regime of the 
optimization, the user-role assignments are pushed towards and 1. As we are ultimately 
interested in binary user-role assignments, we benefit from forcing the model to make crisp 
decisions. 

We provide the update equations of all model parameters used in the M-step in Ap- 



pendix A. 5 The rows of (3 are initialized with random rows of 1— X and {e,r) are initialized 
with (0.1,0.5). The annealed optimization is stopped when the last user was assigned to a 
single role set with a probability exceeding 1 — 10^^. 

4.2 Gibbs sampling 

For nonparametric model variants as, for instance, the DDM with a Dirichlet process for 
the user-role assignments, we employ Gibbs sampling to learn the model parameters. Gibbs 
sampling iteratively samples the assignment of a user to one of the currently existing roles or 
to a new role while keeping the assignments of all other users fixed. This scheme explicitly 
exploits the exchangeability property of these models. This property states that the ordering 
in which new objects are randomly added to the clusters does not affect the overall distribution 
over all clusterings. 

All terms involved in the sampling step are derived in Appendix |A.3| The probability 



for assigning the current user to a particular role is given in Eq. (52 1. The Gibbs sampler 
alternates between iterating over all user-role assignments and over all permission-role as- 
signments. It stops if the assignments do not change significantly over several consecutive 
iterations or if a predefined maximum number of alternations is reached. While running 
the sampler, the algorithm stores the state with the maximum a-posteriori probability and 
reports this state as the output. This book-keeping leads to worse scores than computing the 
estimated score by averaging over the entire chain of sampled RBAC configurations. How- 
ever, this restriction to a single solution reflects the practical constraints of the role mining 
problem. Ultimately, the administrator must choose a single RBAC configuration. 



5 Experiments 

In this section we experimentally investigate the proposed models on both artificial and real- 
world datasets. We start by comparing MAC and DDM on datasets where we vary the 
noise level. Afterwards we compare MAC and DDM with other methods for Boolean matrix 
factorization on a collection of real-world datasets. 
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5.1 Comparison of MAC and DDM 

MAC and DDM originate from the same core model. However, they differ in the foUowing 
respects: First, DDM has one extra layer of roles. This additional layer, encoded in the 
assignment matrix Y, creates a clustering of the permissions. Therefore, DDM has a two- 
level role hierarchy while MAC models flat RBAC. Second, DDM has additional constraints 
on its assignment matrices Z and Y dictating that the business roles must be disjoint in 
terms of their users and the technical roles must be disjoint in terms of their permissions. 
The assignments of business roles to technical roles V are unconstrained. MAC has no 
constraints at all. A user can have multiple roles and permissions can be assigned to multiple 
roles. The last difference between the two models are the prior assumptions on the model 
parameters. While MAC implicitly assumes uniform prior probabilities for its parameters, 
DDM makes explicit non-uniform assumptions encoded in the Beta priors and the Dirichlet 
priors. 

To evaluate which of the two model variants is best suited to solve the role mining problem, 
we design the following experiment. We generate access control data in two different ways. 
One half of the datasets is generated according to the MAC model. We take a set of roles and 
randomly assign users to role combinations to create a user-permission assignment matrix. 
Then, we randomly select a defined fraction of matrix entries and replace them with the 
outcome of a fair coin flip. Some users in these datasets are generated by multiple roles. 
The second set of datasets is generated from the DDM probability distribution by repeatedly 
sampling business roles and technical roles from the Dirichlet process priors and randomly 
connecting them according to the Beta-Bernoulli probabilities. 

On both kinds of data sets, we infer RBAC configurations with DDM and with MAC. In 
this way, the model assumptions always hold for one of the models while the other one operates 
in a model-mismatch situation. Moreover, data from DDM can have an arbitrary number of 
underlying roles. Therefore, MAC requires an additional model-order selection mechanism. 
Cross-validation with the generalization test described in Section [2?2] is employed as a quality 
measure. 

We control the difficulty of the experiments by varying the noise level in the data. For 
each noise level, we sample 30 datasets from each model variant, each with 400 users and 50 
permissions. On each dataset, we run each model 10 times and select the solution with the 
highest internal quality score, respectively. Finally, we evaluate the inferred roles based on 
their generalization error on a hold-out test set. 

Results We report the median generalization error of the inferred matrix decompositions 
and the 25%/75%-percentiles in Figure [t] The left plot depicts the generalization error on 
DDM data and the right plot shows the error on MAC data. 

We see that the overall trend of both models is similar for both types of data. The 
generalization error increases with increasing noise. There are two explanations for this 
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behavior. First, the problem of estimating the structure of the data becomes increasingly 
difficult when increasing the noise level. Second, noisy bits are likely to be wrongly predicted, 
even when the data structure is learned well. 

We also see that MAC and DDM generalize almost equally well on DMM data. MAC 
is even slightly better than DDM. In contrast, DMM achieves a worse generalization error 
than MAC on MAC data in the intermediate noise range. One would expect that each model 
generalizes best on data that is generated according to its assumptions. This behavior can be 
observed on MAC data. However, on DDM data MAC is as good as DDM. The reason is that 
for DMM data, the model assumptions of MAC are in fact not violated. Even though DMM 
has an extra layer of roles, this model instance is less complex than flat RBAC (which MAC 
models). One can see this by collapsing one DDM layer. For instance, define u' := v * y. 
Then permissions can be assigned to multiple roles (because there are no constraints on the 
business-role to technical-role assignment v). At the same time, z still provides a disjoint 
clustering of the users. In this flat RBAC configuration (z, u') the roles overlap in terms of 
their permissions but not in terms of the users that are assigned to the roles. The same model 
structure arises in single-assignment clustering (SAC), a constrained variant of MAC, where 
users can have only one role. As a consequence, we can interpret DDM as a SAC model 
with prior probabilities on the model parameters. In contrast to MAC, SAC yields inferior 
parameter estimates as it must fit more model parameters for the same complexity of data. 
Hence, it has a larger generalization error. 

In contrast to the structural difference between DDM and MAC, the differences in the 
optimization algorithm and in the Bayesian priors have only a minor infiuence on the results. 
It appears that MAC can compensate for a missing internal mechanism for finding the appro- 
priate number of roles when an external validation step is provided. Also, the Gibbs sampling 
scheme and the deterministic annealing algorithm perform equally well on the DDM data. 
Given that the prior Beta distributions for the Bernoulli variables of the DDM provide no 
improvement over MAC, it seems unnecessary to extend MAC with such priors. 

We experimentally confirmed that MAC is a more general model than DDM. We also 
found that the generalization error provides a good criterion for selecting the number of roles 
without making explicit prior assumptions on the distribution of this number. We therefore 
recommend using the MAC model for role mining with real-world datasets. 



5.2 Noise 



In this section we focus on erroneous user permission assignments. As explained in Section 2.1 
the user-permission assignment matrix UPA given as an input to the role mining problem 
contains exceptional or erroneous entries, which we refer to as noise. More precisely, we 
assume that a hidden noiseless assignment UPA' exists, but only UPA, a noisy version of 
it, is observable. By inferring a role structure RC = (z, u) that supposedly underlies the 
input matrix UPA, our probabilistic models approximately reconstruct the noiseless matrix 
as z * u. In synthetic experiments, one can compare this reconstruction against the noise- 
free assignment UPA' . We investigate two questions. First, how conservative are the RBAC 
configurations that our algorithms find, that is, how many new errors are introduced in the 
reconstruction? Second, does the probabilistic approach provide a measure of confidence for 
reconstructed user-permission assignments? 

Figure [8] depicts error rates that provide answers to both questions. In Figure [8a] we 
contrast the fraction of wrong reconstruction assignments that are new with the fraction 
of wrong input assignments that have not been discovered. To obtain these values, we ran 
experiments on input matrices created from the DDM with a fraction of randomized bits that 
varies from 5% to 75%. As can be seen, MAC's ratio of newly introduced to repeated errors 
is constant for false negatives and false positives alike. DDM tends towards repeating old 
errors and introduces fewer new errors, while the sum of new and old errors is approximately 
the same as for MAC. The maximal sum of error rates is 8% false negatives and 7.5% false 
positives, which is small compared to the maximal fraction of random bits 75%. 



Figure 8b illustrates the empirical probability that a reconstructed user-permission as- 
signment is wrong. The x-axis is the posterior probability of the true value p{x'^^\RC) of this 
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Figure 8: Error analysis. 8a How many new errors are added to the assignment versus how 
many original errors were not identified? We distinguish false negatives from false positives. 
|8b[ Trend of the empirical chance of creating a wrong assignment as a function of the 
posterior probability of the correct assignment given the learned RBAC configuration. It is 
apparent that the posterior provides a measure of confidence. 

assignment, i.e. the probability of reconstructing the assignment correctly as computed by 
the learned model. The plot convincingly illustrates that the posterior provides a measure of 
confidence. The stronger the model prefers a particular assignment (the closer the posterior 
is to 1), the less likely it is to introduce an error if one follows this preference. This model 
property means that, in addition to the RBAC configuration, our algorithms can output a 
confidence score for each resulting user-permission assignment. This uncertainty estimate 
could help practitioners when configuring RBAC using our methods. 



5.3 Experiments on real-world data 

In this section we compare MAC and DDM with other Boolean matrix factorization tech- 
niques on real-world datasets. 

The first dataset LE-access comes from a Large Enterprise. It consists of the user- 
permission assignment matrix of = 4900 users and D = 1300 permissions as well as 
business attributes for each user. In this section, we will ignore the business attributes but 
in Section [6] we will include them in a hybrid role mining process. The other datasets are six 
publicly available datasets from HP labs [S]. They come from different systems and scenarios. 
The dataset domino s is an access-control matrix from a Lotus Domino server, customer is 
the access-control matrix of an HP customer, americas small and emea are access-control 
configurations from Cisco firewalls, and f irewalll and f irewall2 are created from policies 
of Checkpoint firewalls. 

We compare several algorithms for Boolean matrix factorization on these datasets. In 
addition to DDM and MAC, there are two other probabilistic methods that have been de- 
veloped in different contexts. Binary Independent Component Analysis (BICA) [2? learns 
binary vectors that can be combined to fit the data. These vectors, representing the roles in 
our setting, are orthogonal, that is, each permission can be assigned to only one role. In |39] . 
an Indian buffet process has been combined with a noisy-OR likelihood to a nonparamet- 
ric model that learns a Boolean matrix factorization. We call this model infinite noisy-OR 
(INO). The noisy-OR likelihood is closely related to the likelihood of MAC. The difference is 
that its noisy bits are always flipped to 1, whereas in MAC a noisy bit is a random variable 
that is 1 with probability r and otherwise. Similar to the Dirichlet process used in DDM, 
the Indian buffet process in INO is capable of learning the number of factors (here the num- 
ber of roles). Another method that we compare with is a greedy combinatorial optimization 
of a matrix factorization cost function. This method, called Discrete Basis Problem solver 
(DBPS), was proposed in [28]. It finds a Boolean matrix decomposition that minimizes the 
distance between the input matrix and the Boolean matrix product of the two decomposition 
matrices. This distance weights false Is differently compared to false Os, with weighting fac- 
tors that must be selected. The decomposition is successively created by computing a large 
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set of candidate vectors (here candidate roles) and then greedily selecting one after the other 
such that in each step the distance ftmction is minimized. 

For each dataset, we randomly subsample a training set containing 80% of the users and 
a hold-out test set containing the remaining 20% users. All model parameters are trained 
on the training set and the generalization error is evaluated on the test set. We repeat this 
procedure five times with a different random partitioning of the training set and the test set. 

We train the model parameters as follows. DDM and INO select the number of roles 
internally via the Dirichlet process and the Indian Buffet process, respectively. For MAC, 
BICA, and DBFS, we repeatedly split the training data into random subsets and compute 
the validation error. We then select the number of roles (and other parameters for BICA and 
DBFS, such as thresholds and weighting factors) with lowest validation error and train again 
on the entire training set using this number. 

Table 1: Results on HF labs data 
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Our experimental findings on the HF datasets are summarized in Table |9] We report the 
number of users and permissions and scores for each method: the number of roles discovered, 
the median generalization error and its average difference to the 25% and 75%-percentiles, and 
the the time required for one run. For americas small and emea the errors of all methods are 
within the percentiles. For customer, DDM generalizes best. For f irewalll and dominos, 
DDM and MAC perform equally well and lead the ranking, whereas for f irewall2 DDM is 
inferior to MAC. While for all datasets MAC finds solutions with an error close to the solutions 
of the "winner", DDM deviates largely from the best method in the case of f irewall2. 

All methods differ significantly in runtime with DBFS always being the fastest method. 
However, the run-times are difficult to compare for several reasons. First, all algorithms have 
been implemented by different authors in different programming languages. Second, they all 
use different stopping criteria. We manually selected these criteria such that each method 
achieves good results on training data, but it is impossible to tune them for a fair runtime 
comparison. Finally, all algorithms, except for INO and DDM, must be run several times to 
search for appropriate parameters. The runtimes reported in Table [T] account for one such 
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Figure 9: Generalization error on real-world data. 



run. To find the appropriate k for MAC, DBPs, and BICA, we start with k — 2 and increase 
k until the generalization error significantly increases. The final value of k given in Table [T] 
is thus indicative of the number of runs. For DBPs, we additionally tuned the parameters 
"threshold" and "bonuses" with a grid search over 25 candidate values. 

The results on LE-access are graphically illustrated in Figure|9] The MAC model general- 
izes best. The two nonparametric Bayesian models DDM and INO have a similar performance. 
DBPS performs a bit worse and BICA has the highest generalization error. Apparently, on 
this dataset, the assumption of orthogonal roles does not hold. 



6 Hybrid role mining 

Hybrid role mining accepts as input additional information on the attributes of the users 
or business processes. The goal, as defined by the inference role mining problem, remains 
unchanged: Find the RBAC configuration that most likely underlies the observed user- 
permission assignment. As this configuration is assumed to reflect the business properties 
and the security policies of the enterprise, we approach the hybrid role mining problem by 
jointly fitting an RBAC configuration to the user-permission assignment and to the business 
information given as additional input. 

We account for this additional information by modifying the optimization problem for 
role mining. The original problem is to minimize the negative log-likelihood for the MAC 



model (19 1. The original cost function is 



= J2 ^^^^ic ^ e {0, 1}, Vz : ^ = 1 . (21) 

C 

Here we used the assignments from user i to the set of roles C. With these assignments 
and with the assignments zck from role sets to roles, a user is assigned the roles, which are 
contained in the role set that he is assigned to {zit — J2c ^iczck)- The individual costs of 
assigning a user i to the set of roles C is 

R^ai = - log l^n^A/ {x^d I /3, r, e) j . (22) 

We now add an additional term to this negative log-likelihood cost function to define an 
objective function for hybrid role mining. We use a linear combination of the likelihood costs 



Eq. (21) and a term R^^^ that accounts for business information costs: 

R := i?("V^ + A^^^-* , (23) 

where A > weights the influence of the business information. The term \/D makes the 
likelihood costs independent of the number of permissions D. 



18 



Role mining without business information is a special case of Eq (23), where A = 0, 
whereby role mining optimizes the model parameters with respect to Eq. (21 1. This problem 
has a huge solution space, spanned by the model parameters, with many local minima. By 
incorporating business information into this optimization problem (A > 0), we introduce a 
bias toward minima that correspond to the business attributes used as additional input. 

We consider an RBAC configuration to be meaningful if employees satisfying identical 
business predicates (that is, having the same business attributes) are also assigned to similar 
(ideally identical) sets of roles. To account for this, we propose an objective function R^^'^ 
that computes the similarity of the role assignments of all pairs of users with the same 
business attributes. R^^^ compares all pairs of users having the business attribute s 

with respect to their role assignments (z^. , Zi'.). Let Wis G {0, 1} encode whether user i has 
business attribute s {wis = 1) or not (wi^ — 0). Then the cost of a user-role assignment 
matrix Z is 

R'^^^ = ^ ^ ^ w^sw^'s XI ^''''^ (1 ~ 2z^'fcz»fe) . (24) 



k 



N is the total number of users and k E {1, . . . , K} is the role index. Each user has a single 
business attribute s, that is J2s — ^' but can be assigned to multiple roles, 1 < J2k -^^fc — 



K. The term Zi'k (1 — 2ziikZik) in Eq. (24) computes the agreement between the binary 



assignment vectors {zi.,Zi'.) for all pairs of users (i,z') having the same attribute s (which 
is the case iff uuisWi's = 1). The subterm 1 — 2zi'kZik switches the sign of a single term 
such that agreements {zikZi'k = 1) are rewarded and differences {zikZi>k = 0) are penalized. 



An alternative to Eq. (24) would be to compute the Hamming distance between the two 



assignment vectors. However, this has the drawback of penalizing pairs with differently sized 



role sets. We have chosen the dissimilarity function in Eq. ( 24 1 to avoid a bias towards equally 
sized role sets. 

Note that our objective function conceptually differs from those proposed in [51 [5U1 HP] . 
While we minimize the role dispersion of users with the same attributes, their objective is 
to minimize measures of attribute dispersion [40] of users with the same roles. We see two 
advantages of our cost function. First, it enables multiple groups of users with different 
attributes to share roles, such as users of all organizational units getting a "standard role". 
Second, it is easier to add new users in our framework as usually the attributes of new users 
are given whereas the roles are not. 

Optimization 

We now demonstrate how to optimize the new cost function for hybrid role mining using 
the deterministic annealing framework presented in Section [4] Specifically, we convert the 
term R^^^ into a form that enables us to conveniently compute the responsibilities jic in the 
E-step. 

The responsibility 7i£ of the assignment-set C for data item i is given by 

exp(-fl,£/r) 



In Eq. (20), the individual cost terms were Rx, = ^ log (p(x*i|£i, /3, e, r)). Now the full 



costs are extended with R^^K To compute Eq. (25), we shall rewrite i?*^'^' as a sum over the 
individual contributions of the users. 

Let Nsk ■= X^i ^ikWis be the number of users that have the business attribute s and are 
assigned to role k, and let Si be the attribute of user i. We first simplify and reorganize 
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Eq. ( 24 1 using these auxiliary variables 



R^"^ = ^ E E E (^-fc - ^^^kNsk) 

si k 



N 



Zik- 



N 



(26) 



In this formulation, it becomes apparent that user i with attribute Si should be assigned to 
the role k (zik = 1) if many users have attribute Si and role k. 

Finally, we decompose the costs into individual contributions of users and role sets. We 
use the notion of user-to-role set assignments Zic to substitute the user-role assignments by 

^ik — ^£ ^iC^Ck' 

^''^ - E f f 1 - E ^^c^ck) ^ - E ^) 

iM W C J C J 



= E^- E 



\k4C 



Sik 
1^ 



kec / i,C 



is) 
c 



(27) 



In this form we can directly compare the business cost function with the likelihood costs given 



by Eq. (22). We can therefore easily compute the expectation in the E-step by substituting 



the costs in Eq. ( 



25 ) with Ric = R 



ill) 



R 



(S) 
■i,C ■ 



In the iterative update procedure of the deterministic annealing scheme, one faces a 
computational problem arising from a recursion. The optimal assignments z^c depend on the 



N. 



Si k : 



which are, in turn, computed from the Zic themselves. To make this computation at 



step t of our algorithm feasible, we use the expected assignments 7, 



(*-i) 



E 



(t-1) 



of the 



previous step instead of the Boolean z^^ to approximate A^f*^ by its current expectation with 
respect to the Gibbs distribution: 



N. 



it) 



Si k 



N' 



(t-i) 



Sik 



(28) 



We do not have a proof of convergence for this algorithm. However we observe that when 
running it multiple times with random initializations, it repeatedly finds the same solution 
with low costs in terms of business information and model likelihood. 



6.1 Selecting relevant user attributes 

An important step for hybrid role mining is selecting of the set of user attributes used 
as input for the optimizer. User attributes that do not provide information about the users' 
permissions should not be used for hybrid role mining. In fact, requiring that the roles group 
together users with irrelevant attributes can result in inferred RBAC configurations that are 
worse than those inferred without using the attributes. 

To select appropriate user attributes, we propose an information theoretic measure of 
relevance. Let the random variable £ {0, 1} be the assignment of permission 0? to a 
generic user. S is the random variable that corresponds to the business attribute of a generic 
user (e.g. "job code") and let s be one of the actual values that S can take (e.g. "accountant"). 
Let p{xd) ■— l/N ■ J2i ^id be the empirical probability of d being assigned to an unspecified 
user, and let p{xd\S — s) :— 1/N ■ XidWis be the empirical probability of d being assigned 
to a user with business attribute s. With these quantities, we define the binary entropy, the 
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conditional entropy, and the mutual information as: 



h{Xd) := - E PM^og^ipixd)) , (29) 
xae{a.i} 

h{Xd\S) -^p(s)^p(a;d|5 = s)log2(p(xrf|5 = s)) , (30) 

I{Xd;S) := h{Xd) - h{Xd\S) . (31) 

The entropy h(Xd) quantifies the uncertainty about whether a user is assigned permission 
d. The conditional entropy h{Xd\S) is the uncertainty for a user whose business attribute 
5* is known. The mutual information I(Xd',S) is the absolute increment of information 
about the user-permission assignment gained by knowledge of S. This number indicates how 
relevant the attribute S is for the assignment of permissions. There is one pitfall, though. 
If one compares this score on different permissions d with different entropies h(Xd), then 
low-entropic permissions will have a smaller score simply because there is little entropy to 
reduce. We therefore compute the relative mutual information ([6], p. 45) as a measure of 
relevance: 

Pd{S) := I{Xd; S)/h{Xd) = 1 - h{Xd\S)/h{Xd) . (32) 

We use the convention 0/0 := 1 for the case where h{Xd) = (then I{Xd\S) will also be 
0). This number can be interpreted as the fraction of all bits in Xd that are shared with S. 
Alternatively, Pd{S) can be read as the fraction of missing information about permission d 
that is removed by the knowledge of S. 

Limit of few observations per business attribute One should take care to use suf- 
ficiently many observations when estimating the relevance Pd{S) of a business attribute S. 
With too few observations, this measure is biased towards high relevance. Imagine the prob- 
lem of estimating the entropy of a fair coin based on only one observation (being heads 
without loss of generality). Naively computing the empirical probability of heads to be 1 
provides an entropy of 0, which differs considerably from the true entropy of a fair coin which 
is 1 bit. The same effect occurs when one computes the permission entropy conditioned on 
an irrelevant attribute where only one observation per attribute value is available. In [311, for 
instance, the last name of a user was found to be highly relevant! A practical solution is to 
compute Pd{S) with only those values of S where sufficiently many observations are available. 
For instance, if more than 10 users have the feature s= "Smith", the empirical probability 
p{xd\S — s) will give a good estimate of h{xd\S — s). In our experiments, we neglected all 
attribute values with less than 10 observations. 

We apply the proposed relevance measure to the two different user attributes of the 
LE-access dataset, the job code (JC) and the organizational unit (OU) of a user. The first 
attribute is each user's so-called job code, which is a number that indexes the kind of contract 
that the user has. We initially believed that this attribute would be highly relevant for each 
user's permission as it is indicative of the user's business tasks. We compute the relevance 



of these two attributes for each permission. The results are depicted in Figure 10 In these 
histograms, we count the number of permissions for which the respective user attribute has 
the given relevance score. As can be seen, the average reduction in entropy is much higher 
for the OU (87.8%) than for JC (49.7%). This result means that knowledge of the OU 
almost determines most of the permissions of the users while the JC provides relatively little 
information about the permissions. We therefore only use OU in our role mining experiment. 



6.2 Results for hybrid role mining 

We run experiments on the LE-access dataset. This time, we use the adapted E-step as 
derived in Section |6] with the organizational unit (OU) of the user as the business attribute 
5*. Again, we randomly split the data into a training set and a test set, learn the roles on 
the training set, and then compute the generalization error on the test set. This time, we fix 
the number of roles to fc = 30 and only study the influence of the business attributes on the 
result by varying the mixing parameter A. 
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Figure 10: Distribution of the measure of relevance Eq. ( 32 ) , the mutual information weighted 
with inverse permission entropy. 



To evaluate how well the resulting role configuration corresponds to the business attributes 
of the user, we compute the average conditional role entropy: 

hiC\S) ■.= --Y.T.P(^^c\S = s,)\ogAp{z^c\S ^ -s,)) . (33) 
i=l c 

This number indicates how hard it is to guess the role assignments Zic of a user i given his 
organizational unit s^. For a good RBAC configuration this entropy is small, meaning that 
the knowledge of the organizational unit suffices to determine the set of roles that the user 
has. If the RBAC configuration does not correspond to the organizational units at all, then 
knowledge of this attribute does not provide information about a user's roles and, as a result, 
the role entropy is high. The advantage of the conditional role entropy over other measures of 
dispersion such as, for instance, nominal variance [5] or attribute-spread is that it directly 
resembles the task of an administrator who must assign roles to new users. When deciding 
which roles should be assigned to a user given his business attributes, there should be as 
little uncertainty as possible. For the same reason, we select relevant business information 
by relative mutual information instead of heuristic scores such as "mineability" [5] . 



Figure 1 1 : Generaliza- 
tion error versus condi- 
tional role entropy for the 
experiment using organiza- 
tion units. The dashed 
lines represent the stan- 
dard deviation over ten re- 
peated experiments. 
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We depict the results in Figure [TT] We plot the role entropy against the generalization 
error. Each point on this line is the median result over ten experiments with different random 
splits in the training set and the test set. Each point is computed with a different mixing 



O.CES- 

I 0015 

i 0.01 
m 

S 0.005 

0- 



22 



parameter A. The plot demonstrates that by increasing the influence of the organizational 
unit, the role entropy decreases while the generalization error increases. This reduction means 
that hybrid role mining with this attribute requires a trade-off. Fortunately, most reduction in 
role entropy can be achieved without significantly increasing the generalization error. When 
further increasing A, the generalization error increases without significantly improving role 
entropy. This insensitivity indicates a Pareto-optimum (marked with a circle in Figure 11), 
which defines the influence that one should give to user attributes in hybrid role mining. 
Note that, we are able to flnd this point (to produce Figure 11) without knowledge of the 
true roles. 



7 Related work 

Shortly after the development of RBAC P^, researchers and practitioners alike recognized 
the importance of methods for role engineering, see for instance [7 . As explained in the 
introduction, these methods can be classified as being either top-down or bottom-up. Top- 
down methods use process descriptions, the organizational structure of the domain, or features 
of the employees as given by the Human Resources Department of an enterprise, to create 
roles. |32| . for instance, present a scenario-driven approach where a scenario's requirements 
are analyzed according to its associated tasks. Permissions are then granted that enable 
the task to be completed. In [lOj . a work flow is proposed to manually engineer roles by 
analyzing business processes. Today, all pure top-down approaches are manually carried out. 
Bottom-up role methods have the advantage that they can be partially or fully automated 
by role mining algorithms, thereby relieving administrators from this time-consuming and 
error-prone task. 

The flrst bottom-up role mining algorithm was proposed in |24j and coined role mining. 
Since then, a number of different bottom-up approaches have been presented. For instance, 
the algorithm of [33] merges sets of permissions to construct a tree of candidate roles. After- 
wards, it selects the final roles from this tree such that, at every step, the permissions of a 
maximum number of users are covered. Subsequent role mining algorithms usually followed 
a similar structure: they first construct a set of candidate roles and afterwards greedily select 
roles from this set [37l STJ [2] . These algorithms differ from each other with respect to the 
proposal creation step or the cost function used to select roles. Vaidya et al. formally defined 
the role mining problem and variants in 35 and 36J and investigated the complexity of these 
problems. The problem definitions proposed in these two papers differ from our role infer- 
ence problem in that either the number of roles or the number of residuals tolerated when 
fitting the RBAC configuration is given as input. Moreover, these definitions aim at a high 
compression rate, while role inference aims at discovering the latent roles that underlie the 
data. 

The results presented in this paper build upon our prior work. In |14| . we analyze the 
different definitions of role mining and define the role inference problem. In [13] . we derive 
DDM from the deterministic permission assignment rule in RBAC. In |34j . we propose the 
MAC model, which we explore further in [17], adding different noise processes. [15] focuses 
on determining the number of roles for role mining in particular, and selecting the number 
of patterns in unsupervised learning in general. Finally, in |16| we propose the hybrid role 
mining algorithm that we revisit in Section |6] 

In this paper, we extend and generalize our prior work. In particular, we draw connec- 
tions between all these concepts and approaches. We thereby generalize them within one 
consistent framework that covers i) the definition of role mining, ii) the approach to solve it, 
and iii) methods to evaluate solutions. While in [TJ] we motivate the role inference problem 
by analyzing the real- world requirements of RBAC, here we take a more direct approach. We 
identify the input that is usually available in realistic scenarios and directly derive the role 
inference problem from these assumptions. Moreover, we subsume the models of our prior 
work, DDM and MAC, within one core model. In this way, we analyze the relationships 
between them and highlight the influence of noise processes, role hierarchies, and constraints 
on role assignments. This theoretical comparison of the models is supplemented by an ex- 
perimental comparison of DDM, MAC, and other models and algorithms. In addition, we 
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provide a sound measure of confidence for user permission assignments and investigate how 
conservatively the proposed methods modify noisy assignments. We also show that DDM 
is structurally equivalent to a MAC model constrained to every user only having one role. 
Finally, our paper contains the first experimental comparison of DDM and MAC on both 
synthetic and real-world data. 

8 Conclusion 

We put forth that, in contrast to conventional approaches, role mining should be approached 
as a prediction problem, not as a compression problem. Wc proposed an alternative, the role 
inference problem, with the goal of finding the RBAC configuration that most likely under- 
lies a given access control matrix. This problem definition includes the hybrid role mining 
scenario when additional information about the users or the system is available. To solve the 
role inference problem, wc derived a class of probabilistic models and analyzed several vari- 
ants. On real-world access control matrices, our models demonstrate robust generalization 
ability while other methods are rather fragile and their generalization ability depends on the 
particular data analyzed. 
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A Appendix 
A.l Marginalization 

We convert the deterministic formula Eq. Q into a probabilistic version by marginalizing out 
the latent variables u^^ from the joint distribution for u^^ and Xid. The joint distribution is 

p{xid = 0, u*rf|/3^^, Zi*) = p{xid = 0|u*d, Zi*) ]^p(ufcd|^fe<i) • (34) 

k 
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As a single bit Xid depends on the outcome of a full bitvcctor u^d of length K, one must 
marginalize over all realizations of such bit-vectors. 

Let ri be the set of all possible binary vectors u^^- Then the likelihood for Xid = is 



= X! P{xid = 0,u^d\f3*d,^i*) 

= ^ p{x^d=0\U■^.d,Zi.^.)Y\_Piukd\|3kd) 

u^rfGSl k 



u.jjGil \k:Zik = l 



\k:Zik=0 ) 



(35) 
(36) 

(37) 

(38) 

(39) 



In the step from (|38[ ) to (39 1 we substituted the individual probabilities with their definitions 
Eq. (|3]) and Eq. (|6|. In the last step, we separated the bit-vectors u^^ into the two cases where 
Zifc = 1 and Zifc = 0. The first case cancels all contributions of the sum where Ziu = Ukd = 1 
and for Zik = i,Ukd = only the factor P^-d remains. Therefore, it is convenient to introduce 
a modified set of bit-vectors O' = {u*^ e ft \ukd = , for all k with Zik — 1} C ft, i.e., the 
entries of Ukd that are relevant for object i arc fixed to 0. The likelihood then takes the 
following compact form 



u'.eil' {\k:Zik = l / \k:Zik=0 



(40) 



The sum in Eq. (40) has \ ft'\ terms. Let us pick a particular fc" with Zik" = 0. Half of 
the vectors in ft' have u^'ij = 1 and the other half have Ufc"j = 0, whereas the remaining 
bits k k" are equal in both halves. We can therefore factor out the terms where bitvectors 



differ only at position fc" in Eq. ( 40 ) . This reduces the number of terms in the sum by a 



factor of two, whereby the sum now ranges over the modified set ft" C ft' , where all bits are 
varied except for k" . 

In the following steps, we recursively factor out such terms. This successively interchanges 



the sum and the product in Eq. ( 40 1 and makes it easy to see, in Eq. ( 43 ) , that all contributions 
with Zik = sum up to 1. 



P{Xtd = 0\P*d,2'i*) 

= P'k'''^ {{^-Pk",)+Pk",) E 



= 1 Vfe" 
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k : = 1 
k ^ k" 



Pkd 
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k : Zik = 
k ^ k" 



iw^id) { n E Pl^'^-'i^-h^rA-W-kd 

\ k / \k:Zik=Q ti;,^e{0,l} / k 



(41) 



(42) 
(43) 



= 1 Mk 



A. 2 Hierarchies 

We compute the probability that a user is not assigned to a permission in a two-level role 
hierarchy given the probabilities of independent role hierarchy parameters. We start from 
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Eq. ^. 



K 



> fe=i 



Zik ^\\J Vki S yid 



l<k<K 



(44) 
(45) 



Y[ (p{zikAUkd\ztk,ytd^ VfcJ +P{zik/\Ukd\z:tk^ytd^ VfcJ +P (,Z^k/\Ukd\z^k,ytd^ 
k ^ ' 

(46) 

n(i-4 + 4M^|y+d,v+)) (47) 



Note that in the step from (451 to ( 46 1 , the correct probabihty is only obtained when summing 



over the probabihties of exclusive events (in particular: a A b = a V b but p (a A &) ^ p (a) + 
p (b)). Given the generation Ukd — V; "Vki A yid-, we have that 



PiUkdWtd^^k* 



Y[p {vki A yidlvi^d^vu) = n [piyidlvM) + Piyid\y^d)pi''kiHi)] ■ (48) 



Substituting this into Eq. ( 47 1 yields 



p {x-d |z+ , y+ , v+)=ll[i- z+ + z+ n (1 - y^) + yU^ - vti)) 

k \ I / 



(49) 



A. 3 Gibbs Sampling Algorithm for DDM 

Here, we derive all necessary distributions for the Gibbs sampling algorithm for DDM in 
Section [42l We start by collecting all distributions that define the model. The data likelihood 
of the model is 



p(xiz,Y,/3)=n(i-/5.z)"^' /3:F 



(50) 



k.l 



= S'-i'c=i' and n-^} = X]'-ifc=i- \x,,=q} ■ The parameters 



with the counters _ ^, ^ 

I3ki are random variables themselves. They follow the Beta distribution 



Pb {Pki]l,l) 



£(27) 
2r(7) 



7-1 



(51) 



Here i?(., .) is the beta function, also known as "Euler integral of the first kind". 

In each step, one must sample new assignments of user i' to roles from the distribution 



p(zi/fe==l|X,Zi^i'*, Y) = const •p(X|Z,Y)p(zi'fc = l|zi^j/*) 



(52) 



Assigning permissions to roles (updating Y) has the same form, just with Y and Z inter- 
changed. In order to compute this term for a particular user i' , one must compute the 
evidence term p(X|Z, Y) and the Dirichlet process prior p (z^/j. = Ijzi^i'^,) for all available 
roles k e and for a potential new role with index K + 1. We introduced the 



Dirichlet process prior p{zi>k^l\zi^i'^) in Section 3.4 The evidence term is 

p(X|Z,Y,/3)p(/3|Z,Y) 



p(X|Z,Y)= J p(X,/3|Z,Y)d/3 = 



p(/3|Z,Y,X) 



(53) 
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So far, we just applied Bayes' rule. The numerator is a product of terms that we ah'eady have: 
the data likehhood in Eq. (50 1 and the Beta distributions p (/3|Z, Y) =Y[k i^b{f3ki',7,l)- The 
term in the denominator is the posterior probabihty p(f3\7i,Y , X) (the probabihty of (3 after 
having observed the data X). We can rewrite this (again using Bayes) to 

p(X|Z,Y,/3)nw A(/3fc/;7,7) 
p(/3|Z,Y,X) = ^^ '7^7;-'^^''''^''^^ (54) 

p(X|Z, Y) 



= const • p (X|Z, Y, (3) [] (fSkin, l) (55) 

k,l 

= const . n (1 - /3lf • (56) 

k.l 

Comparison with the Beta distribution makes it easy to identify how this probabihty must 
be normahzed. Therefore, we can analyticaUy compute the posterior: 

p(/3|Z,Y,X) -_[! [l-Pki) liki (57) 

k,i r('^L +7)r«/ + 7) 



n B[n^^ + X + 7)-^ (1 - f^k^^^''-' Pf^-'-' . (58) 

k,l 



The Beta distribution is a conjugate prior of the Bernoulh distribution. As a consequence, 
the posterior of a Bernouhi hkehhood and a Beta prior again has the form of a Bernoulh 
distribution as we just observed in the last derivation. Substituting the posterior back to 
Eq. ( 53 1 results in the analytic expression of the evidence term. 

p(xiz.Y).n ^'""B(;:f <^'' 

Computation of this term only involves updating the two counters. 

A. 4 Marginalization of the noise indicator ^ 

Let us assume the random noise indicator follows a Bernoulli distribution 

p{^rd\e)^e^^-il-e)'-^^^ . (60) 



The structural bit is generated by the structure model pnat {xid \ Z, /3) (Eq. 13 1, the noise bit 
is generated by the Bernoulli distribution p^y (a;^ | r), and the observable bit is generated by 

X^d = {I- £.id)xfd + iidxfd ■ (61) 

Using these distributions, the joint probability of Xid and S,id is 

p {x,d, C I Z, /3, r, e) = [| (e • pM {x^ \ r)f' ■ ((1 - e) • Pflat (x^d \ Z, f3))'-^'' . (62) 

id 

We sum over all possible outcomes of all noise indicators and obtain the final likelihood of 
this mixture noise model. 

PM (X I Z, /3, r, e) = ^ n (e • p^ {x,d \ r)f' ■ ((1 - e) • Pflat (x.d | Z, l3))''^'' (63) 

la 

= n E (e-PJv(x.<jk))«-.((l-e).Mat(x., |Z,/3))^-«- (64) 

id ?.de{0,l} 

= '[[i^-PN {x^d I r) + (1 - e) • Pflat {x,d I Z, /3)) . (65) 
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A. 5 Update equations for deterministic annealing 

We define the empirical risk of assigning a user i to the set of roles £ as the negative log- 
likelihood of the permissions aij* being generated by role set £, given parameters 9 = e, r): 

RiC := - ^ogpjxi. \C, Q) = - log {xid (1 - qcd) + (1 - Xid)qcd) , 

d 

where for the auxiliary variable we have that qcd '■= er + (1 — e) (1 — I3cd)- 
The responsibility -jic of the assignment-set £ for data item i is given by 

lic ■■= ^ ^—s — t;^ • (66) 

In this way, the matrix 7 defines a probability distribution over the space of all clustering 
solutions. The expected empirical risk Eg [R\ of the solutions under this distribution G is 
Eg [Ric\ = Si 7i-c^i-c- Finally, the peirtition function Z and the free energy F are 
defined as follows. 



^:=ni]exp(-i?,£/T) F:=-TlogZ = -r^log(^exp(-i?,£/r) j . 

i C i \ C J 



(67) 



Given the above, we derive the updates of the model parameters based on the first-order 
condition of the free energy F. We therefore introduce the generic model parameter 0, which 
stands for any of the model parameters, i.e. 6 S {/3^t/, e, r}. Here, n is some particular value 
of role index k and v is some particular value of permission index d. Using this notation, the 
derivative of the free energy with respect to 6 is given by 

w = E E 7- ^ = E E 7- E (1 _ + (1 _ • (68) 

The partial derivatives with respect to ^ e {(ifiv, e, are: 

Q d d 

-Q^Qcd = (1 - e) fic\{n},d I{i.=d} , ^^•^'^ = '^-r- Pcd , = -e • (69) 

This results in the following first-order conditions for the individual role parameter updates 
and the noise parameter updates: 

" ■ he I er + (1 - e) (1 - ^c.) " 1 - er - (1 - e) (1 - Pc.) / = ° 

er+{l-e){l-pcd) ^ I - er - {I - e) {I - pcd)\ 

d I £ ^er + (l-e)(l-/3£d)/ ^' ^ 

Here, ^a{m},>^ = Ilfcer, 

fc^^ ^fei/ and Y^o.nec ""o^® containing role 

As there is no analytic expression for the solutions of the above equations, we use Newton's 
method we find the root. 
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